Skip to main content

Overview

Phase 1 establishes the foundation of the EDL Pipeline by fetching the complete market dataset and fundamental metrics. This phase produces the critical master_isin_map.json file that all subsequent phases depend on.
If fetch_dhan_data.py fails, the entire pipeline stops. This script produces master_isin_map.json which ALL other scripts require.

Execution Order

Phase 1 runs these scripts sequentially:
1

Fetch Master Stock List

Script: fetch_dhan_data.pyFetches all NSE equity stocks in a single API call.
2

Fetch Fundamental Data

Script: fetch_fundamental_data.pyIterates through each ISIN to fetch quarterly results and financial ratios.
3

Download Listing Dates

Helper: curl commandDownloads NSE equity listing dates CSV for enrichment.

Script 1: fetch_dhan_data.py

Purpose

Fetches the complete list of NSE equity stocks (~2,775 symbols) with basic metrics and creates the master ISIN mapping file.

API Endpoint

POST https://ow-scanx-analytics.dhan.co/customscan/fetchdt

Request Payload

{
  "data": {
    "sort": "Mcap",
    "sorder": "desc",
    "count": 5000,
    "fields": [
      "Isin", "DispSym", "Mcap", "Pe", "DivYeild", "Revenue",
      "Year1RevenueGrowth", "NetProfitMargin", "YoYLastQtrlyProfitGrowth",
      "EBIDTAMargin", "volume", "PricePerchng1year", "Sym", "Sid", "FnoFlag"
    ],
    "params": [
      {"field": "OgInst", "op": "", "val": "ES"},
      {"field": "Exch", "op": "", "val": "NSE"}
    ],
    "pgno": 0
  }
}

Output Files

FileDescriptionSizeRecords
dhan_data_response.jsonFull API response with all stock data~3 MB2,775
master_isin_map.jsonCritical: Symbol ↔ ISIN ↔ Sid mapping~500 KB2,775

master_isin_map.json Structure

[
  {
    "Symbol": "RELIANCE",
    "ISIN": "INE002A01018",
    "Name": "Reliance Industries Ltd.",
    "Sid": "2885",
    "FnoFlag": 1
  }
]

Dependencies

  • Requires: Internet connection, valid API headers
  • Depends on: None (foundation script)

Typical Execution Time

~5-10 seconds — Single API call fetching 2,775 stocks

Script 2: fetch_fundamental_data.py

Purpose

Fetches quarterly results, financial ratios, and TTM metrics for each stock using the ISIN list from Phase 1.

API Endpoint

POST https://open-web-scanx.dhan.co/scanx/fundamental

Request Payload

{
  "data": {
    "isin": "INE002A01018"
  }
}

Data Fetched

  • Quarterly Results: Latest 4 quarters + YoY comparison
  • Income Statement: Revenue, Net Profit, OPM, EPS
  • Balance Sheet: Total Assets, Liabilities, Equity
  • Ratios: ROE, ROCE, Debt/Equity, P/E, P/B
  • TTM Metrics: Trailing twelve month calculations

Output Files

FileDescriptionSizeRecords
fundamental_data.jsonComplete fundamental dataset~35 MB2,775

Output Structure

{
  "Symbol": "RELIANCE",
  "ISIN": "INE002A01018",
  "incomeStat_cq": {
    "Net_Profit": "17594|16446|15138|17955|12273",
    "Eps": "26.10|24.40|22.46|26.64|18.21"
  },
  "CV": {
    "PE": "28.5",
    "ROE": "8.2",
    "ROCE": "9.1"
  },
  "roce_roe": {...},
  "TTM_cy": {...}
}

Dependencies

  • Requires: master_isin_map.json (from fetch_dhan_data.py)
  • Timeout: 30 seconds per request
  • Threading: 20 concurrent workers

Typical Execution Time

~2-3 minutes — Fetching 2,775 stocks with 20 threads

NSE Listing Dates Download

Purpose

Downloads the official NSE equity listing dates CSV for enrichment in Phase 3.

Command

curl -s -o nse_equity_list.csv \
  "https://nsearchives.nseindia.com/content/equities/EQUITY_L.csv" \
  --http1.1 \
  --header "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"

Output Files

FileDescriptionFormat
nse_equity_list.csvSymbol → Listing Date mappingCSV

CSV Structure

SYMBOL, NAME OF COMPANY, DATE OF LISTING
RELIANCE,Reliance Industries Limited,29-NOV-1977
TCS,Tata Consultancy Services Limited,25-AUG-2004
This is a non-critical download. Pipeline continues even if this fails.

Phase 1 Output Summary

Files Produced

📦 Phase 1 Outputs:
├─ dhan_data_response.json        (~3 MB)
├─ master_isin_map.json           (~500 KB) ⚠️ CRITICAL
├─ fundamental_data.json          (~35 MB)
└─ nse_equity_list.csv            (~200 KB)

Critical Dependencies for Phase 2+

All Phase 2 scripts require master_isin_map.json to iterate through stocks:
  • fetch_company_filings.py
  • fetch_new_announcements.py
  • fetch_advanced_indicators.py
  • fetch_market_news.py
  • … and 6 more scripts

Error Handling

Critical Failure: fetch_dhan_data.py

If this script fails, the pipeline stops immediately:
results["fetch_dhan_data.py"] = run_script("fetch_dhan_data.py", "Phase 1")

if not results["fetch_dhan_data.py"]:
    print("🛑 CRITICAL: fetch_dhan_data.py failed. Cannot continue.")
    print("   This script produces master_isin_map.json which ALL other scripts need.")
    return

Non-Critical: Other Failures

  • fetch_fundamental_data.py fails: Pipeline continues, but fundamental fields will be empty
  • NSE CSV download fails: Pipeline continues, listing dates will be missing

Performance Metrics

Total Phase 1 Time

~2-4 minutes for 2,775 stocks (including NSE CSV download)

Bottlenecks

  • fetch_fundamental_data.py: API rate limits (mitigated with 20 threads)
  • Network latency: Depends on connection speed

Optimization Tips

  1. Increase threading (if API allows):
    ThreadPoolExecutor(max_workers=30)  # Up from 20
    
  2. Cache master_isin_map.json between runs:
    if os.path.exists("master_isin_map.json"):
        print("Using cached master map...")
    
  3. Skip fundamental refetch for unchanged stocks (requires change detection)

Next Phase

Once Phase 1 completes, the pipeline automatically proceeds to:

Phase 2: Data Enrichment

Fetches company filings, announcements, indicators, news, and surveillance data using the master ISIN map.